Search CORE

37 research outputs found

Saber: window-based hybrid stream processing for heterogeneous architectures

Author: Costa P
Fernandez R
Koliousis A
Pietzuch P
Weidlich M
Wolf A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/09/2015
Field of study

Modern servers have become heterogeneous, often combining multicore CPUs with many-core GPGPUs. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not supported by current relational stream processing engines. For an engine to exploit a heterogeneous architecture, it must execute streaming SQL queries with sufficient data-parallelism to fully utilise all available heterogeneous processors, and decide how to use each in the most effective way. It must do this while respecting the semantics of streaming SQL queries, in particular with regard to window handling. We describe SABER, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs. SABER executes windowbased streaming SQL queries in a data-parallel fashion using all available CPU and GPGPU cores. Instead of statically assigning query operators to heterogeneous processors, SABER employs a new adaptive heterogeneous lookahead scheduling strategy, which increases the share of queries executing on the processor that yields the highest performance. To hide data movement costs, SABER pipelines the transfer of stream data between different memory types and the CPU/GPGPU. Our experimental comparison against state-ofthe-art engines shows that SABER increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with small and large windows sizes

Spiral - Imperial College Digital Repository

CloudScope: diagnosing and managing performance interference in multi-tenant clouds

Author: Chen X
Franciosi F
Knottenbelt W
Osman R
Pietzuch P
Rupprecht L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/10/2015
Field of study

© 2015 IEEE.Virtual machine consolidation is attractive in cloud computing platforms for several reasons including reduced infrastructure costs, lower energy consumption and ease of management. However, the interference between co-resident workloads caused by virtualization can violate the service level objectives (SLOs) that the cloud platform guarantees. Existing solutions to minimize interference between virtual machines (VMs) are mostly based on comprehensive micro-benchmarks or online training which makes them computationally intensive. In this paper, we present CloudScope, a system for diagnosing interference for multi-tenant cloud systems in a lightweight way. CloudScope employs a discrete-time Markov Chain model for the online prediction of performance interference of co-resident VMs. It uses the results to optimally (re)assign VMs to physical machines and to optimize the hypervisor configuration, e.g. the CPU share it can use, for different workloads. We have implemented CloudScope on top of the Xen hypervisor and conducted experiments using a set of CPU, disk, and network intensive workloads and a real system (MapReduce). Our results show that CloudScope interference prediction achieves an average error of 9%. The interference-aware scheduler improves VM performance by up to 10% compared to the default scheduler. In addition, the hypervisor reconfiguration can improve network throughput by up to 30%

Spiral - Imperial College Digital Repository

Scalable and Fault-tolerant Stateful Stream Processing.

Author: Fernandez R. C.
Kalyvianaki E.
Migliavacca M.
Pietzuch P.
Publication venue
Publication date: 01/01/2013
Field of study

As users of "big data" applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the "pay-as-you-go" model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs—systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the checkpointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures

CiteSeerX

City Research Online

Dagstuhl Research Online Publication Server

Recommended from our members

Making State Explicit for Imperative Big Data Processing

Author: Fernandez R. C.
Kalyvianaki E.
Migliavacca M.
Pietzuch P.
Publication venue
Publication date: 01/01/2014
Field of study

Data scientists often implement machine learning algorithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the performance and scalability of specialised data-parallel processing frameworks. Our goal is to execute imperative Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java programs without compromising scalability, and how to recover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explicitly separating data from mutable state, SDGs have specific features to enable this translation: to ensure scalability, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for independent computation, which are reconciled according to application semantics. For fault tolerance, large inmemory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

City Research Online

Kent Academic Repository

Spiral - Imperial College Digital Repository

FLICK: developing and running application-specific network services

Author: Alim A
Clegg RG
Costa P
Crowcroft J
Koleni M
Madhavapeddy A
Mai L
McAuley D
Migliavacca M
Moore A
Mortier R
Oviedo L
Pietzuch P
Rupprecht L
Seckler E
Sultana N
Wolf AL
Publication venue: USENIX Association
Publication date: 30/05/2016
Field of study

Data centre networks are increasingly programmable, with application-specific network services proliferating, from custom load-balancers to middleboxes providing caching and aggregation. Developers must currently implement these services using traditional low-level APIs, which neither support natural operations on application data nor provide efficient performance isolation. We describe FLICK, a framework for the programming and execution of application-specific network services on multi-core CPUs. Developers write network services in the FLICK language, which offers high-level processing constructs and application-relevant data types. FLICK programs are translated automatically to efficient, parallel task graphs, implemented in C++ on top of a user-space TCP stack. Task graphs have bounded resource usage at runtime, which means that the graphs of multiple services can execute concurrently without interference using cooperative scheduling. We evaluate FLICK with several services (an HTTP load-balancer, a Memcached router and a Hadoop data aggregator), showing that it achieves good performance while reducing development effort

Spiral - Imperial College Digital Repository

TaLoS: secure and transparent TLS termination inside SGX enclaves

Author: Aublin P-L
Eyers D
Fetzer C
Kelbert F
Krahn R
Lind J
Muthukumaran D
O'Keffe D
Pietzuch P
Priebe C
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/2017
Field of study

We introduce TaLoS1, a drop-in replacement for existing transport layer security (TLS) libraries that protects itself from a malicious environment by running inside an Intel SGX trusted execution environment. By minimising the amount of enclave transitions and reducing the overhead of the remaining enclave transitions, TaLoS imposes an overhead of no more than 31% in our evaluation with the Apache web server and the Squid proxy

Spiral - Imperial College Digital Repository

Glamdring: automatic application partitioning for Intel SGX

Author: Aublin P
Eyers D
Fetzer C
Goltzsche D
Kapitza R
Kelbert F
Lind J
Muthukumaran D
O'Keeffe D
Pietzuch P
Priebe C
Reiher T
Publication venue: USENIX
Publication date: 21/04/2017
Field of study

Trusted execution support in modern CPUs, as offered by Intel SGX enclaves , can protect applications in untrusted environments. While prior work has shown that legacy applications can run in their entirety inside enclaves, this results in a large trusted computing base (TCB). Instead, we explore an approach in which we partition an applica- tion and use an enclave to protect only security-sensitive data and functions, thus obtaining a smaller TCB. We describe Glamdring , the first source-level parti- tioning framework that secures applications written in C using Intel SGX. A developer first annotates security- sensitive application data. Glamdring then automatically partitions the application into untrusted and enclave parts: (i) to preserve data confidentiality, Glamdring uses dataflow analysis to identify functions that may be ex- posed to sensitive data; (ii) for data integrity, it uses back- ward slicing to identify functions that may affect sensitive data. Glamdring then places security-sensitive functions inside the enclave, and adds runtime checks and crypto- graphic operations at the enclave boundary to protect it from attack. Our evaluation of Glamdring with the Mem- cached store, the LibreSSL library, and the Digital Bitbox bitcoin wallet shows that it achieves small TCB sizes and has acceptable performance overheads

Spiral - Imperial College Digital Repository

Faces in the Clouds: Long-Duration, Multi-User, Cloud-Assisted Video Conferencing

Author: Clegg RG
Griffin D
Hughes P
Kegel I
Landa R
Pietzuch P
Rio M
Stevens T
Williams D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2017
Field of study

Multi-user video conferencing is a ubiquitous technology. Increasingly end-hosts in a conference are assisted by cloud-based servers that improve the quality of experience for end users. This paper evaluates the impact of strategies for placement of such servers on user experience and deployment cost. We consider scenarios based upon the Amazon EC2 infrastructure as well as future scenarios in which cloud instances can be located at a larger number of possible sites across the planet. We compare a number of possible strategies for choosing which cloud locations should host services and how traffic should route through them. Our study is driven by real data to create demand scenarios with realistic geographical user distributions and diurnal behaviour. We conclude that on the EC2 infrastructure a well chosen static selection of servers performs well but as more cloud locations are available a dynamic choice of servers becomes important

Crossref

UCL Discovery

Spiral - Imperial College Digital Repository

Queen Mary Research Online

FLICK: Developing and running application-specific network services

Author: Alim A
Assoc USENIX
Clegg RG
Costa P
Crowcroft J
Koleini M
Madhavapeddy A
Mai L
McAuley D
Migliavacca M
Moore AW
Mortier R
Oviedo L
Pietzuch P
Rupprecht L
Seckler E
Sultana N
Wolf AL
Publication venue: Proceedings of the 2016 USENIX Annual Technical Conference, USENIX ATC 2016
Publication date: 01/01/2016
Field of study

Data centre networks are increasingly programmable, with

\textit{application-specific}

network services proliferating, from custom load-balancers to middleboxes providing caching and aggregation. Developers must currently implement these services using traditional low-level APIs, which neither support natural operations on application data nor provide efficient performance isolation. We describe FLICK, a framework for the programming and execution of application-specific network services on multi-core CPUs. Developers write network services in the FLICK

\textit{language}

, which offers high-level processing constructs and application-relevant data types. FLICK programs are translated automatically to efficient, parallel

\textit{task graphs}

, implemented in C++ on top of a user-space TCP stack. Task graphs have bounded resource usage at runtime, which means that the graphs of multiple services can execute concurrently without interference using cooperative scheduling. We evaluate FLICK with several services (an HTTP load-balancer, a Memcached router and a Hadoop data aggregator), showing that it achieves good performance while reducing development effort.Engineering and Physical Sciences Research CouncilThis is the author accepted manuscript. The final version is available from USENIX via https://www.usenix.org/conference/atc16/technical-sessions/presentation/ali

Nottingham ePrints

Nottingham eTheses

Kent Academic Repository

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Queen Mary Research Online

Extending programs with debug-related features, with application to hardware development

Author: Clegg R
Crowcroft J
Galea S
Greaves DJ
Mai L
Moore AW
Mortier R
Pietzuch P
Sultana N
Wojcik M
Zilberman N
Publication venue: arXiv
Publication date: 01/01/2017
Field of study

The capacity and programmability of reconfigurable hardware such as FPGAs has improved steadily over the years, but they do not readily provide any mechanisms for monitoring or debugging running programs. Such mechanisms need to be written into the program itself. This is done using ad hoc methods and primitive tools when compared to CPU programming. This complicates the programming and debugging of reconfigurable hardware. We introduce Program-hosted Directability (PhD), the extension of programs to interpret direction commands at runtime to enable debugging, monitoring and profiling. Normally in hardware development such features are fixed at compile time. We present a language of directing commands, specify its semantics in terms of a simple controller that is embedded with programs, and implement a prototype for directing network programs running in hardware. We show that this approach affords significant flexibility with low impact on hardware utilisation and performance.This work has received funding from the EPSRC NaaS grant EP/K034723/1, European Union's Horizon 2020 research and innovation programme 2014-2018 under the SSICLOPS (grant agreement No. 644866), the Leverhulme Trust Early Career Fellowship ECF-2016-289 and the Newton Trust

arXiv.org e-Print Archive

Apollo (Cambridge)